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Robert Borges! 
Abstract 


This paper discusses the use of rapid automatized picture naming (RAN) in 
the assessment of proficiency among new speakers of endangered 
languages. Despite the fact that measuring proficiency among new speakers 
is crucial vis-a-vis the development of didactic materials and understanding 
language change, there are often a number of practical issues that reduce 
the practicality of traditional language evaluation methods. This paper 
investigates the potential of RAN assessments to provide a suitable 
indication of language proficiency by means of accuracy (ability to name 
pictures), speed (how quickly a verbal response is produced), and cognitive 
control (how well the speaker mediates cognitive load while performing the 
task). Results from RAN assessments administered among new speakers of 
Wymysorys, in concert with other data collection procedures, indicate that 
this type of task provides accurate insight into speakers’ proficiency. 
Latencies in the bilingual picture naming allow accurate insight into 
speakers’ proficiency as a function of the relative degrees of language 
entrenchment. However, increasing cognitive load during the assessment 
via speed of cue stimulus and frequently switching trial language showed 
no effect relative to the proficiency rank order established by naming 
accuracy and speed. 

Keywords: endangered languages, language assessment, language change, 
language proficiency, rapid automatized picture naming (RAN) 


Introduction 


Assessment of proficiency is a crucial aspect of the study of language 
acquisition. In children, language is acquired in naturalistic stages manner 
with regular age-determined norms for the acquisition of structures, 
depending on the language being acquired (Meisel, 2011). In second 
language acquisition (SLA), acquisition also progresses in a stage-like 
manner, however these are mediated by other factors, like first-language 
interference, age of start of learning, effort invested in learning, learning 
strategies, quantity and quality of input, etc. (Ellis, 1997; Smith, Truscott & 
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Hawkins, 2013). From a didactic point of view, understanding the 
progression of an individual’s proficiency level allows the educator to 
determine placement in a group of similarly developed individuals, who in 
turn receive structured input targeted to aspects of language that are still 
lacking. From a research perspective, the ability to track the trajectory of 
individuals’ proficiency development will allow for a better understanding 
of the relationship between language variation and change. 


One approach to the still-open question in sociolinguistics regarding the 
precise roll of synchronic variation in diachronic language development 
(Léglise & Chamoreau, 2013) is to study new speakers of endangered 
languages in terms of their acquisition of proficiency, lexical and structural 
variants in their speech production, and the social networks within which 
endangered-language practices are situated. Such contexts provide an ideal 
venue to investigate the development of interlanguage (Selinker, 1972) and 
its potential role in wider language change. Since new speakers tend to 
occupy an active role in the community and represent larger and more 
influential proportion of the overall speech community than learners of non- 
threatened languages, the context is conducive to the observation of 
interlanguage features that ascend to the status of model for other language 
learners, replacing the “native” variety as target, thus instantiating language 
change. In order to make claims in this regard, it is necessary to know 
whether observed feature variants result from the acquisition process (i.e. a 
speaker compensates for incomplete or inadequate acquisition of target 
structures) or if variants have become part of that speaker’s competence. In 
order to assess individual speakers’ competence, it is necessary therefore to 
implement a measure that does not rely on normative ideas about language 
correctness. This study presents results of an initial attempt at measuring 
language proficiency via Rapid Automatized picture Naming (RAN). The 
study was conducted in Wilamowice (Poland) with new speakers of 
Wymysorys, a critically endangered West Germanic language 


Language assessments usually attempt to elicit a well-rounded picture 
of an individual’s abilities in a variety of practical situations. Two well 
known examples of assessments for English second language speakers, the 
Cambridge Proficiency test and the Test of English as a Foreign Language 
(TOEFL), contain similar sections: reading comprehension, with written 
multiple choice questions or fill-in-the-blanks; a writing section, where one 
must demonstrate mastery of the written language in terms of grammar and 
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vocabulary, but also structuring and supporting argumentation; listening 
comprehension, usually accompanied by some written form of questions; 
speaking, with monologues and dialogues. This strategy is typical of other 
English-as-a-foreign-language assessments as well as assessment among 
learners of other languages. 


Although typical, these methods have been criticized for a variety of 
reasons, including cultural insensitivity, influence on pedagogics (teaching 
for the test), test apprehension, and so on (Gee, 1989). In endangered 
language contexts, there is an additional set of problems with assessing 
proficiency using such strategies. For instance, there may be no widely 
accepted standardized variety, such that an evaluation of an individual’s 
speech production may reflect differences in language variety rather than 
that person’s actual abilities’. Notions of “correctness” from a pedagogical 
point of view are irrelevant to the entrenchment required in the development 
of proficiency. Similarly, a language may not have an established 
orthography, may have multiple competing orthographies, or not be written 
at all. Finally, and especially true of the case presented in this paper, there 
may be insufficient human resources both to produce a listening 
comprehension test and to conduct on-the-spot evaluations of spoken 
language capabilities — quite plainly, there aren’t enough proficient 
individuals consistently available to construct and manage a comprehensive 
standardized test. As a result of these issues, an alternative approach for 
assessing entrenchment and proficiency has been sought. 


In the remainder of the paper, the procedures and results of the RAN 
assessment are outlined. Section 2 provides brief information about 
Wymysorys, its speakers, and the context that led first to endangerment of 
the language and its subsequent revitalization. The following section 
discusses the concept of new speakers of endangered languages and their 
potential to inform understandings of language change. RAN tests are 
introduced in this section and a brief outline of their usage and findings in 
psycholinguistic experimentation is given relating to the potential usage in 
proficiency assessment. Based on these insights, the hypotheses tested in 
the study are presented. Section 4 describes the data collection protocols 
and operationalization of the RAN assessment used in this study. Section 5 
discusses the analytical procedures applied to data generated by the RAN 





lOr worse, that the ascription of one variety as “standard” contributes to demotivation of learners 
who align with non-standard varieties as well as the further marginalization and endangerment of 
other varieties. 
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assessment and presents results from these analyses. Implications of these 
analyses are discussed in terms of the usefulness of RAN as a proficiency 
assessment tool in Section 6. The paper concludes with an evaluation of the 
hypotheses presented in Section 3 and with recommendations for further 
testing and application of RAN as an assessment of language proficiency. 


2. Language Context 


Wymysorys (also Eng. Vilamovian, Pol. Jezyk Wilamowski, endonym: 
Wymysiderys) is a West Germanic language, spoken primarily in and 
around one town, Wil-amowice, in the Silesian Voivodship (Bielsko 
county) of Poland (Wicherkiewicz, 2003; Hammarström, Forkel, & 
Haspelmath, 2018). The area was settled in the 13" century by migrants 
from Western Europe, most probably originating from Frisian areas around 
the Elbe and / or Flanders. Because the area was quite thinly populated, 
settlers were invited from overpopulated Germanic speaking areas, and 
provided financial incentives to relocate by local nobility (Barciak, 2001, p. 
82-85). From that time, a unique multilingual culture developed and 
flourished in the area. Throughout its history, Wilamowice and its people 
straddled boarders, and residents of the town utilized this position by 
establishing wide-reaching trade networks, especially dealing in textiles 
and horses (Wicherkiewicz, 2003, p. 10). The townspeople lived within a 
system of functional polyglossia where Wymysorys was widely used in the 
home and private situations, Polish used for religion and education, and 
later (under Austrian admin-istration) German was used for commerce and 
administration (Ritchie, 2012; Ritchie, 2016; Wicherkiewicz, 2003; Neels, 
2012). 


The vitality of Wymysorys became severely threatened following the 
Second World War. During the war, residents of the town were ascribed 
status of Category 2, “of German descent”, or Category 3, “Voluntarily 
Germanized” on the Deutsche Volksliste ‘German Peoples List’ 
(Wicherkiewicz, 2003). In principle this was voluntary, but in practice those 
who did not volunteer faced severe punishment. Despite the fact that 
Wilamovian people did not identify with Germany or German-ness, an idea 
for which there is pre-war evidence, the Red Army and post-war communist 
government used the Volkliste as a weapon against those who had been 
ascribed to the list (Wicherkiewicz, 2003, 2001). In the case of 
Wilamowice, this meant that the language and any culturally distinct 
expressions (e.g. folk costumes) were banned outright in 1945; perpetrators 
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of language and culture faced evictions, imprisonment, exile, or execution 
(Wicherkiewicz, 2003). As such, community members were required to 
hide their identities, even within extended families, in order to survive. With 
this, people ceased using and teaching the language to their children; 
intergenerational transmission was abruptly stifled. 


Wicherkiewicz’s ominous prediction (Wicherkiewicz, 2000, repeated 
in 2003 and other works) that Wymysorys would not survive the next 
decade, formed part of the motivation that caused the young Tymoteusz 
Krol (b. 1994) to begin recording audio of the language as spoken by his 
grandmother and her friends, eventually amassing around 800 hours of 
audio recordings of elderly speakers of Wymysorys, many of whom are no 
longer living. Around 2007, Krol and his close friend Justyna Olko (2016), 
recognizing the damning lack of didactic materials for Wymysorys, began 
developing these materials based on the audio recordings and teaching the 
language to other children on a private basis. Some of these didactic 
materials were eventually published (Majerska, 2014; fum Dé6kter, 
Wicherkiewicz & fum Bidetul, 2015; Krol, Majerska, & Wicherkiewicz, 
2016). Thanks to the continued efforts of Król and Majerska, along with 
subsequent institutional support from Polish universities and the European 
Union’, there are now approximately twenty five individuals who self- 
identify as new speakers of Wymysorys. Teaching Wymysorys continues 
on a private basis, though there have been intermittent instances of the 
language being taught as an extracurricular activity in the local elementary 
school, even at University of Warsaw, and active communities of practice 
(Wenger, 1999) have developed around a local cultural heritage association, 
theatre group, and folk music/dance troupe. 








These actions have sparked promising developments with regard to the 
survival of Wymysorys. In addition to the growing number of active new 
speaker’s numbers, there has been a shift in attitudes towards greater 
acceptance of the language within the community and in the wider society. 
Anxiety surrounding the use of language and local customs brought about 
by post-war events is easing. Nils describes the prevalence of “double 
*Notably, the projects Linguistic heritage of Poland 2013-2014, financed by the 
National Humanities Program of the Polish Ministry of Science; Endangered languages. 
Comprehensive models for research and revitalization 2013-2016 (see (Olko, Wicherkiewicz, and 
Borges 2016) financed by the Polish Ministryof Science, under agreement 
0122/NPRH2/H12/81/2013; Documentation of Linguistic and Cultural Hertitage of 


Wilamowice 2014-ongoing, financed by the National Humanities program of the Polish Ministry of 
Science, and; Engaged Humanities in Europe 2016-2019 financed by the European Union under the 


Horizon under agreement 692199. 
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identity” among older speakers (Neels, 2012, 128-31), which is also 
strongly evident among the new speakers who participated in this study. 
Local activists struggle for recognition from the Polish government as a 
linguistic minority, but the association with the “Germanness” and the 
volksliste continue to be used as a tool for marginalization’. Those new 
speakers who participated in the current study report that they continue with 
their engagement with local language and culture regardless. 


3. Theoretical Context and Hypotheses 


A new speaker can be defined as an individual who has learned a language 
with little or no exposure in the home via educational programs outside the 
home after a community-level shift (O’Rourke, Pujolar & Ramallo 2015). 
As a relatively recent conceptual addition in sociolinguistics, a number of 
recent publications (O’Rourke & Ramallo, 2013; O’Rourke & Pujolar, 
2013; Hornsby, 2015; Nance, 2015; O’Rourke, Pujolar & Ramallo, 2015; 
O’Rourke & Pujolar, 2015; Hornsby, 2016, 2017; Nance, Mcleod & 
O’Rourke, 2016; Kasstan, 2017; Dotowy-Rybinska, 2016, 2017; Smith- 
Christmas, Hornsby & Moriarty, 2018) have focused on areas of shared 
experience, language practices, and language attitudes and ideologies, 
specifically in terms of language purism, authenticity and ownership. 

















The study of new speakers provides a number of additional unique 
opportunities for linguistics as a discipline which so far have not been 
explored. Of primary importance here is that new speakers’ position and 
role within the endangered language community tends to be more prominent 
in terms of influencing norms of speech behavior than a learner of a healthy 
majority language. In some cases, for example, new speakers outnumber 
remaining “native” speakers, and in extreme cases of language 
endangerment, or those where the last “native” speakers have already, or 
will soon pass away, new speakers are, or are slated to become the speech 
community. These observations lead to the hypothesis, around which this 
work is based, namely that the study of new speaker groups will allow for 
observation of the instantiation of linguistic innovation and the spread of 
these innovations within the speech community in situ. 


In order to address this hypothesis, the interplay of variation and 
language change should also be understood in terms of the causes of 





3The author had opportunity to address the Polish Parliamentary Commission for National and Ethnic 
Minorities as a scientific expert in 2016, and was instructed by the commission to specifically 
provide argumentation against the idea that Wymysorys is a “dialect of German”. 
Si 
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linguistic variation. Socially and geographically conditioned variation 
aside, in SLA contexts —new speakers included— a major source of 
linguistic variation results from the acquisition process itself. Thus, it 
becomes necessary to understand if variation in an individual idiolect at a 
given moment is a representative of that person’s development, or if his/her 
development has plateaued as a stable idiolect. This need to understand 
where people are in terms of their proficiency development, such that it can 
be determined whether a given idiolect is stable enough to be considered an 
interlanguage variety, potentially pose as a model for other users, and 
whether common feature variants are replicated from another idiolect or 
interlanguage, or whether they develop independently in the acquisition 
process. 


Proficiency development involves more than learning words and 
grammar, however. Proficient individuals must learn how to negotiate the 
resources available to them. Studies show that all languages known by a 
multilingual individual are active all the time, and in order to utilize a single 
language, items from non-desired languages must be cognitively inhibited 
(Kroll, Gullifer, Mudry & Martin, 2015; Green & Abutalebi, 2013; 
Hermans, De Bot & Schreuder, 1998; Herdina & Jessner, 2002). One 
potential component of a proficiency assessment, moving towards a more 
online measurement, is Rapid Automatized picture Naming. These tests 
have been arguably shown to tap into the neural mechanisms responsible 
for aspects of language processing (Lervag & Hulme, 2009), and can thus 
be utilized to examine the degree to which linguistic representations are of 
entrenched in an individual’s executive functions. 


Picture naming tasks are widely used in a variety of research areas, 
including psychology, psycholinguistics, bilingualism research, and 
speech-language pathology. The picture RAN test utilized here has its roots 
in the latter, when Geschwind and Fusillo (1966, quoted in Denckla & 
Cutting, 1999) used color chips to assess language deficiencies in adult 
stroke survivors; these individuals were unable to name colors, despite no 
evidence of color blindness and being able to identify the colors via 
matching tests. This type of rapid naming task, where a set of colors was 
shown with the instruction to name them sequentially, was repeated in a 
number of studies for different purposes and with different stimuli (letter 
and number graphemes, whole words, colors, and photographs/images of 
familiar items. The neural circuit idea that developed through observations 
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with stroke victims led to the idea that RAN tests could be widely 
administered and normative results serve as predictors of cognitive 
function, especially reading abilities (Denckla & Cutting, 1999). 


Early on, it was observed that latencies in digit naming (time from 
presentation of numeral stimulus to producing response) among age cohorts 
correlated with word recognition, and served as a predictor of reading 
abilities; automaticity in character recognition is a key prerequisite for 
higher process of word recognition and reading (Spring & Davis, 1988). In 
a similar study, Kail, Lynda & Bradley, (1999) argue that naming latencies 
reflect global development; naming and reading both depend on the rapid 
execution of underlying cognitive processes. In other words, the access to 
memory and automaticity in processing are key elements of both tasks. 
Although studies like these have been oft reproduced, the validity of naming 
speed as an indicator of reading abilities remains debated, especially in 
terms of the details of stimuli and presentation. More recently, however, a 
neuro-imaging study provides evidence in support of discrete RAN tests 
(one stimulus at a time) with image stimuli as an indicator of reading 
abilities, discriminating both poor and above-average readers from average 
readers (Cohen, Mahe, Laganaro & Zesiger, 2018). 


Picture RAN tests have been used in a number of psycholinguistic and 
bilingualism studies (some notable examples: (Sholl, Sankaranarayanan & 
Kroll, 1995; Hoshino & Kroll, 2008; Hermans et al., 1998; Gollan, Starr & 
Ferreira, 2015; Costa & Santesteban, 2004). In these studies, latencies are 
also the key component. As with reading abilities, naming latencies become 
faster in an L2 with the increased entrenchment associated with proficiency 
development. L2 acquisition also affects L1 naming. Ransdell and Fischler 
(1987), for example, showed that bilinguals were generally slower in picture 
naming tasks using their L1 than monolingual peers. Gollan, Montoya, 
Fennema & Morris (2005) found similar results, but also illustrated that 
bilinguals’ latency sped up with immediate repetition of the tasks, 
suggesting that entrenchment can be conditioned in a relatively short time. 
The cost of bilingualism in terms of L1 naming latencies is mediated by 
proficiency in the L2, where increased proficiency in L2 is costly to global 
L1 processing, especially in cases of immersion and / or drastic increases in 
L2 exposure (Costa & Santesteban, 2004; van Hell & Tanner, 2012). 





Naming latencies can also be mediated by other factors as well. For 
example, frequency effects (more frequently used words are subject to 
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faster latencies) in word recognition tasks (Diependaele, Lemhéfer & 
Brysbaert, 2013; de Groot, Borgwaldt, Bos & Eijnden, 2002). Meuter 
(1999) points out that switching languages between trials in RAN tests leads 
to longer latencies in L2 trials, especially in individuals who have weaker 
L2s. High proficiency bilinguals are reportedly not subject to these slower 
latencies in either of their languages (Costa & Santesteban, 2004; Gullifer, 
Kroll & Dussias, 2013), however, suggesting that entrenchment, or strength 
of linguistic representations in executive function, is the crucial factor in 
both naming speeds and balanced bilingualism. 





On the basis of these insights, a number of expectations can be 
postulated regarding the use of picture RAN tests for assessing 
entrenchment of individuals’ languages. Firstly, study participants’ 
accuracy rate in naming can be used as a proxy for their vocabulary size 
and relative exposure to the language. This exposure is a necessary 
prerequisite for entrenchment. Secondly, with increased proficiency in the 
L2, faster latencies and lower standard deviations can be expected in the 
L2, while simultaneously slowing L1 latencies and increasing deviations, 
except in the case of balanced bilinguals, who should show no significant 
difference in either L1/L2 latencies or latencies of monolingual or the L1 in 
low proficiency L2 peers. Thirdly, an observable latency effect should be 
apparent for non-balanced participants with an increase in cognitive 
interference, i.e. (a) after switches in the trial language and (b) with added 
cognitive load by speeding up stimuli images. 


4. Data Collection Procedures 


General data collection with new speakers of Wymysorys was designed to 
target a set of research questions and was streamlined into several tasks. A 
bilingual RAN assessment administered as one task among a set of three 
other tasks. Data collection proceeded according to the following protocol:- 


1. Sociolinguistic questionnaire 

2. Vidio Elicitation of Narrative 

3. RAN assessment 

4. Semi-structured sociolinguistic interview 


This protocol generally lasted less than an hour. All the tasks, with the 
exception of the interview, were administered with stimulus display 
software Open Sesame (Mathdot, Schreij, & Theeuwes, 2012). In a separate 
protocol, participants also worked in pairs at a director-matcher task, 
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wherein a scene of objects was created for a director, who was tasked to 
instruct another participant to reconstruct the scene with an identical set of 
objects. This matcher was not able to see the scene the director was 
reconstructing. Each pair completed the task four times, with each 
individual playing the role of director and matcher two times each. This task 
produced language data with a higher degree of speaker interaction. 


All sessions were audio-video recorded. A high-definition video 
recorder captured video of entire meeting. Similarly, an audio field recorder 
captured audio signal, stored to local media for the duration of each 
protocol. The field recorder was also hard-lined into the stimulus display 
laptop, which allowed audio signal from the recorder’s microphones to be 
captured by the stimulus display laptop within each Open Sesame 
experiment. For the RAN assessment, this strategy produced one audio file 
for each response (n=100 per speaker). 


The RAN assessment itself consisted of a sequence of loops wherein 
image stimuli are presented following a cue that informs the speaker which 
language to use when producing a response. Each loop consisted of the 
following events: 


1. Fixation dot and focus “ding” (500ms) A fixation dot appears in the 
center of the screen and a “ding” sound is played in order to focus the 
participant’s attention on the screen in anticipation of the next cue and 
stimulus. The “ding” audio cue served a secondary purpose, to index 
time intervals during the assessment. In the case where writing 
individual files per response by the stimulus display laptop would fail, 
results of the assessment could be recovered from internal storage on 
ambient recording devices. 


2. Language cue (500ms) following the fixation dot, a cue screen 
presented one of two cues, either a Polish flag or the coat of arms for 
the City of Wilamowice. These cues indicated the language in which 
the participant should respond to the stimulus image. Cues were 
presented in a fixed, pseudo-random sequence order throughout the 
task. The order of cue stimuli is indicated in Appendix 1. 

3. Answer screen (4000ms) the stimulus display laptop’s screen is blank 
for four seconds, allowing the participant to respond without distraction. 
The audio stream continues to be captured by the stimulus display 
laptop for the duration of this event. The audio stream is then written to 
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a unique file and the next loop begins. 


2 
=| 





3 4 
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Figure 1. Example loop in the RAN sequence. Stimulus display (300-— 
500ms) A stimulus images, generated in random order from a pool of 100 
images, is presented on the screen. The pool of images, chosen from Bank 
of Standardized Stimuli (Brodeur, Katherine & Maria, 2014), consists of 
everyday items and items featured in didactic materials developed during 
revitalization efforts (fum Dökter et al., 2015; Krol et al., 2016). A list of 
image stimuli is provided in Appendix 2. Stimuli display speeds were varied 
and followed a fixed order throughout the task. OpenSesame captures audio 
from the moment the stimulus image is displayed. 


Participants were instructed to name each picture according to the cue 
stimulus “as quickly and accurately as possible”. At the beginning of each 
assessment, the participant completed a practice phase, consisting of four 
loops, with two Polish and two Wymysorys cues. Following the practice 
phase, the participant had the opportunity to ask for clarification about the 
instructions or the progression of the assessment. The remainder of the 
assessment proceeded without pause, and presented the participant with an 
additional 56 Wymysorys-cued and 40 Polish-cued stimuli. 


Data were collected with sixteen new speakers in November and 
December 2017, including one reportedly balanced bilingual (the only 
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balanced bilingual within age range of the new speakers, i.e. not elderly). 
The analyses presented here will focus on the responses of those speakers 
who were proficient and outgoing enough to complete both the narration 
section and RAN assessment since RAN results will be compared with 
proficiency measures derived from spoken data in Section 6. Seven 
participants met these criteria. 


5. Data Analysis Procedures 


Each assessment results in 100 audio responses; practice phase responses 
were not included in the analysis. Other responses deemed to be 
unacceptable were also not included; these amounted to just a handful of 
responses in the data. Most often post-practice phase responses were 
discounted as a result of the participant starting the post-practice phase of 
the assessment while clarification / discussion was still going on. Several 
other instances of interruptions, e.g. an uninvited guest entering the room, 
resulted in unacceptable responses. 


Acceptable responses were then evaluated for accuracy (whether or not 
the participant named the item in the picture). Accuracy was determined 
rather inclusively, i.e. acceptable answers for the loop depicted in Figure 1 
might be “ant”, “insect”, “bug”, or similar. Latencies were then measured 
for accurately named pictures. An initial attempt to mechanize this 
processing with a python script proved untenable. The script attempted to 
generate a list of response times for each response, that is, when the audio 
signal crosses a certain threshold, as well as a wave form and spectrogram 
for visual inspection against the response times. However, given the 
different levels of background noise, participant voice tone, and volume, 
even within a single assessment, it was more efficient to analyze the data 
manually than to continuously recalibrate the parameters of the script. 
Response latencies were checked manually for each response using Praat 
(Boersma, 2002). Results were logged to a spreadsheet for further analysis. 
Instances of "um", "uh", and other pre-response vocalizations were not 
considered in the reaction time. Accuracy rates, latencies for correct 
responses, and probability scores calculated for each speaker’s correct 
responses per language are presented in Table 1. Speakers are rank ordered 
according to their accuracy scores. 





As expected, Wymysorys latencies are significantly slower in the five 
speak- Table 1. 





ime 
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Table 1 
Indicate Speakers’ Self-Ascribed Productive Pro-Ficiency in Wymysorys 





Speaker Pro wym A wym RT plA plRT P Sign. 





ms9907 2 0.2 2045.5 0.93 938.0 3.13e-11 *** 
mz0205 2 0.23 1673.0 0.88 1356.5 0.00107 ** 
an0005 1 0.37 1541.5 0.85 927.0 2.1le-07 *** 
kz9902 6 0.43 1881.5 0.93 1022.0 8.3e-07 *** 
bc8612 2 0.6 1394.0 0.98 822.5 3.09e-08 *** 
jm9303 3 0.73 1669.5 0.83 1741.0 0.582 Ø 


tk9307 8 0.96 874.0 0.95 1006.5 0.0697 Ø 





Table 1: The columns of the Table indicate speakers’ self-ascribed 
productive pro-ficiency in Wymysorys (scale:0-8), “Pro”; Wymysorys and 
Polish accuracy rates for the RAN assessment, “wym / pl A”; Mean 
latencies for Wymysorys and Polish, “wym / pl RT”; The “P”, probability 
scores are calculated based on each speaker’s reaction times per language; 
Significance codes are Sign. codes: 0—0.001 ‘***’, 0.001—0.01 ‘**’, 0.01— 
0.05 **? 0.05-0.1 *.’, 0.1-1 ‘ø’. ers with the lowest accuracy ratings in 
Wymysorys. The two speakers with highest accuracy rates showed no 
significant differences in latencies per lan-guage. Interestingly though, 
Polish latencies for speaker jm9303 are noticeably slower than the other 
speakers’. This will be taken up in Section 6. Latencies are plotted by 
speaker and language in Figure 2. 


Contrary to expectations, however, neither stimulus display speed nor 
the order of stimuli produced a significant effect. Probability scores are 
provided in Table 2. 


6. Discussion 


Generally speaking, speakers’ per-language latencies are close to what was 
expected based on the relevant literature. A general down-sloping trend is 
visible. 
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Table 2 
Probability Scores for the Effect of Stimulus Speed and Order in 
Wymysorys-Cued Responses 








Speaker stimulus speed order 
ms9907 0.254 0.209 
mz0205 0.641 0.678 
an0005 0.454 0.637 
kz9902 0.717 0.68 
bc8612 0.54 0.765 
jm9303 0.542 0.149 
tk9307 0.752 0.0781 





4000 - 


3000 - 


1000 - 





ms9907 mz0205 an0005 kz9902 bc8612 jm9303 k9307 
Figure2. RAN tests for measuring language proficiency: results from 
Wilamowice 


Figure 2: RAN latencies per language. Individual speakers are indexed 
on the x-axis; the y-axis shows the latency value in milliseconds. Red box 
plots illustrate speaker latencies for Polish-cued stimuli. The blue ribbon 
and line illustrate the standard deviation and mean, respectively, for 
Wymysorys-cued stimuli. A point for each Wymysorys response overlaid 
on the plot, further illustrating whether the previous loop was cued for the 
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same language (= triangle) and the speed of the image stimulus 
(500ms=pink, 400ms=green, 300ms=blue). 


In the Wymysorys-cued responses; latencies are faster with increased 
accuracy. The hypothesized narrowing of standard deviations is less clear, 
nevertheless they are certainly narrower in the balanced participant (tk9307) 
than the rest. 


Another striking feature of the data is the difference in latencies between 
speakers tk9307 and jm9303. Neither speaker’s response times are 
significantly different per language, though they are different from each 
other. (A full comparison of speakers’ performances compared to each other 
can be found in Figure 3.)This can be explained in terms of entrenchment. 
The first speaker displays no significant differences between his 
Wymysorys and Polish latencies, or between most of the other participants’ 
Polish latencies, which suggests he is able to suppress non-target 
representations without hesitation, supporting his self-categorisation as a 
balanced bilingual (Costa & Santesteban, 2004; van Hell & Tanner, 2012). 
In contrast, the second speaker, despite a very good accuracy score, and 
excellent abilities in Wymysorys, was significantly slower in both 
languages, suggesting a  still-ongoing reorganization of linguistic 
representation in the cognitive processing of language. In this case, strength 
of entrenchment is such that the cost of suppressing non-target elements is 
costly for the L1 (Costa & Santesteban, 2004; Gullifer et al., 2013). 


| = 


jm9303 - 











distance, 
be8612 


kz9902 - 





ms9907  mz0205 an0005 kz9902 bc8612 jm9303 tk9307 


Figure 3: Distance matrix comparing speaker latencies per language. 
The value for generating the color gradient for each tile is the P-value 
generated in an ANOVA comparing latencies per speaker and language. 
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The upper-left section displays differences among speakers in Wymysorys. 
The bottom-right section displays differences among speakers in Polish. 
Actual numerical values are given in Appendix 3.Other potential anomalies 
are explained via knowledge of the speakers’ engagement. The Wymysorys 
latencies of speaker kz9902 are relatively slow given his place in the rank 
order, as well as his own self-assessment. However, this speaker had been 
relatively unengaged with the language activities and the group of new 
speakers in general in the months leading up to his participation in the study. 
Since the automaticity from language entrenchment is something achieved 
by regular repetition and rehearsal, it could be postulated that his latencies 
were effected by atrophy. Another speaker, mz0205, produced appropriate 
Wymysorys latencies based on her self-assessment of proficiency and her 
accuracy rank order, however, her Polish latencies are significantly slower 
than the rest (except jm9303). Contrary to the previously discussed speaker, 
she is actively involved in regular Wymysorys language related activities. 
She also discussed some difficulties with foreign language usage —or, 
language block— in general during interviews. And although she didn’t say 
much or speak very quickly during the narration task, her utterances 
appeared deliberate and structurally accurate. This suggests a potentially 
higher degree of Wymysorys entrenchment than indicated by the rank order, 
causing a similar suppression cost as in speaker jm9303. 


One possible criticism of the study regards creating a rank order based 
on Wymysorys accuracy rates. It could, of course, be that participants were 
just unlucky with the selection of stimuli. An attempt has been made to 
mediate this by choosing stimuli that represent everyday items and / or are 
represented in didactic materials utilized by new speakers (fum Dökter et 
al., 2015; Krol et al., 2016). Nevertheless, the accuracy rank order is 
compared with other hypothesized measures of proficiency, such as 
speakers’ own self assessment, speech rate, "um" rate, and lexical diversity* 
(Polinsky, 2008; Irizarri van Suchtelen, 2016). These measures were 
performed on each speaker’s narration of the well-known “Pear film”; 
results are presented in Table 3. 








None of these additional measurements, purported to be indicative of 
proficiency, replicate the rank order derived from Wymysorys accuracy, nor 
do they provide a better fit to the latencies slope. While it is suspect that 
none of these measures replicate the same rank order, they have not yet been 





‘The potential redundancy between lexical diversity and accuracy rate is recognized. Both 
are indications of vocabulary size and relate to entrenchment in the same way. a 
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fully tested here, especially in regards to the minimum amount of data to 
achieve consistent results. Just one six-minute stimulus probably does not 
provide an adequate supply of data. Nevertheless, it seems prudent to 
continue with multiple types of assessments so as not to misrank the speaker 
who, for example, just says “um” a lot, or just got unlucky with the selection 
of stimuli, but otherwise uses the language in a highly proficient way. 


Table 3. 
Comparison of Proposed Proficiency Measures. Speakers 
Speaker Pro wpml wpm2 umtok umtim tot. words lex. div 











ms9907 2 1114 524 45 38.38 312 0.28 
an0005~t 96.6 23.9 23 tS.ts 142 0.39 
kz9902 6 165.2 75.8 4 3.86s 451 0.36 
bc8612 2 116.8 66.7 66 57.1s 397 0.28 
jm9303 3 119.3 71.6 14 7.78 484 0.37 





Table 3: Comparison of proposed proficiency measures. Speakers are 
organized in their Wymysorys accuracy-based rank order. “Pro” is each 
speaker’s self assessment for Wymysorys productive proficiency; Words 
per minute are indicated as a calculation based on the duration of the entire 
stimulus film —‘wpm1’— and based on the speakers’ actual talking time 
(>0.5s silence omitted) —“wpm2”. “um tok” lists the number of “um” tokens 
produced by each speaker, while “um tim” is the sum time of the speaker’s 
occurrence of “um”. Total number of words each speaker used in the film 
is indicated under “tot. words” and lexical diversity indicates total unique 
words divided by total number of words. N.B. despite completing the other 
films in the narration task, speaker mz0205 froze during this particular 
stimulus and is thus omitted from the table. (scale:0—8) 


A final point for discussion addresses the apparent lack latency effect 
after intentional stressors on cognitive processing by switching cue stimuli 
and varying stimulus display speeds during the RAN assessment. One can 
only speculate at this point about the lack of latency effect in the ordering 
of stimuli. The fact that switching language cues occurred throughout the 
test perhaps resulted in quick habituation of cognitive resources needed (cf. 
Schuch and Grange, 2018) to mediate the tasks. This is perhaps an 
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indication that the study design should be revised, utilizing alternative 
chunking patterns, for instance with longer stretches of cues in one language 
or the other. Speed of the stimuli was another factor intended to strain 
cognitive resources, thus slowing down latencies in less entrenched 
language varieties. The complete lack of effect may indicate that the stimuli 
speeds were too close together. 


7. Conclusion 


This paper explored the idea of using rapid automatized picture naming tests 
as a component in assessing proficiency in endangered language contexts. 
This is necessary because creation and application of traditional 
standardized tests in these contexts is problematic for a number of reasons, 
outlined earlier section. Using data collected among new speakers of 
Wymysorys the hypotheses put forward were tested. 


First, it was hypothesized that accuracy rates in naming pictures in 
Wymysorys is representative of vocabulary size in the language and can be 
taken as proxy for general amount of exposure to the language. A 
satisfactory rank order of participants was created based on these accuracy 
rates, though this order was not recreated in either speaker self-assessment 
of productive proficiency or other measures implemented on spoken data, 
such as speech rate, “um” rate, or lexical diversity. Secondly, the data 
support the hypotheses that reaction times can be used as a tool for assessing 
proficiency as a function of language entrenchment. Lower-proficiency 
speakers all showed significant differences in latencies per language. The 
two higher-proficiency speakers showed no significant differences in 
latencies between languages, and while one speaker with balanced high 
proficiency produced no difference in latencies compared to the L1 of 
lower-proficiency participants, the other unbalanced, high-proficiency 
speaker’s results illustrated the cognitive load of mediating languages with 
differential entrenchment. An attempt to stress the cognitive processing and 
thereby slow down latencies in less proficient participants via regular 
switching of language cues and alternating the speed of stimuli failed across 
the group of study participants, contrary to expectations based on previous 
studies. This suggests that procedural aspects of the assessment described 
here should be revised in future research. 


In addition to these revisions, the assessment described here should 
undergo further testing. At the time of this draft, a second test following the 
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same protocols has been administered with the same cohort of Wymysorys 
new speakers; analysis of these tests will be presented sometime in the 
future. However, at present Wymysorys new speakers make up an 
extremely small group and the assessment described here must certainly be 
tested on larger groups of new speakers of endangered languages and 
control groups of larger-language bilinguals, before a suggestion of more 
generalizable effectiveness could be posited with some degree of certainty. 
It is worthwhile to continue developing proficiency assessments that do not 
rely on prescriptivist notions of the language in question. Such assessments 
can be employed in small, endangered language contexts, where scarcity of 
human resources and the potential lack of standardization do not allow for 
more traditional proficiency assessments to be performed. Alternative 
assessments that measure proficiency as a function of language 
entrenchment, such as RAN, also bring us closer to understanding the 
relationship between proficiency and language acquisition, on the one hand, 
and language variation and change, on the other. 
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